Learning Approaches to Wrapper Induction

نویسندگان

Gunter Grieser

Steffen Lange

چکیده

The number, the size, and the dynamics of Internet information sources bears abundant evidence of the need of automation in information extraction (IE). This paper deals with the question of how such extraction mechanisms can automatically be created by invoking learning techniques. The underlying scenario of system-supported IE is putting certain constraints on the available training examples. Therefore, the traditional approaches to formal language learning do not capture the kind of problems to be solved when learning the corresponding extraction mechanisms. We illustrate the resulting differences by studying the problem of learning a particular type of extraction mechanisms (so-called island wrappers). We show how to decompose this learning problem into different subproblems that can be handled independently and in parallel. Moreover, we relate the learning problems on hand to the problems that learning theory papers originally address and point out what they have in common and where the differences are.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Active Learning with Strong and Weak Views: A Case Study on Wrapper Induction

Multi-view learners reduce the need for labeled data by exploiting disjoint sub-sets of features (views), each of which is sufficient for learning. Such algorithms assume that each view is a strong view (i.e., perfect learning is possible in each view). We extend the multi-view framework by introducing a novel algorithm, Aggressive Co-Testing, that exploits both strong and weak views; in a weak...

متن کامل

View Validation: A Case Study for Wrapper Induction and Text Classification

Wrapper induction algorithms, which use labeled examples to learn extraction rules, are a crucial component of information agents that integrate semi-structured information sources. Multi-view wrapper induction algorithms reduce the amount of training data by exploiting several types of rules (i.e., views), each of which being sufficient to extract the relevant data. All multiview algorithms re...

متن کامل

Control of Inductive Bias in Supervised Learning using Evolutionary Computation: A Wrapper-Based Approach

In this chapter, I discuss the problem of feature subset selection for supervised inductive learning approaches to knowledge discovery in databases (KDD), and examine this and related problems in the context of controlling inductive bias. I survey several combinatorial search and optimization approaches to this problem, focusing on datadriven validation-based techniques. In particular, I presen...

متن کامل

Populating Ontologies with Data from OCRed Lists

A flexible, accurate, and efficient method of automatically extracting facts from lists in OCRed documents and inserting them into an ontology would help make those facts machine searchable, queryable, and linkable and expose their rich ontological interrelationships. To work well, such a process must be adaptable to variations in list format, tolerant of OCR errors, and careful in its selectio...

متن کامل

Wrapper Induction for Information Extraction

Wrapper Induction for Information Extraction by Nicholas Kushmerick Chairperson of Supervisory Committee: Professor Daniel S. Weld Department of Computer Science and Engineering The Internet presents numerous sources of useful information|telephone directories, product catalogs, stock quotes, weather forecasts, etc. Recently, many systems have been built that automatically gather and manipulate...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Learning Approaches to Wrapper Induction

نویسندگان

چکیده

منابع مشابه

Active Learning with Strong and Weak Views: A Case Study on Wrapper Induction

View Validation: A Case Study for Wrapper Induction and Text Classification

Control of Inductive Bias in Supervised Learning using Evolutionary Computation: A Wrapper-Based Approach

Populating Ontologies with Data from OCRed Lists

Wrapper Induction for Information Extraction

عنوان ژورنال:

اشتراک گذاری